31 research outputs found

    Improving unsupervised learning with exemplarCNNs

    Get PDF
    Most recent unsupervised learning methods explore alternative objectives, often referred to as self-supervised tasks, to train convolutional neural networks without the supervision of human annotated labels. This paper explores the generation of surrogate classes as a self-supervised alternative to learn discriminative features, and proposes a clustering algorithm to overcome one of the main limitations of this kind of approach. Our clustering technique improves the initial implementation and achieves 76.4% accuracy in the STL-10 test set, surpassing the current state-of-the-art for the STL-10 unsupervised benchmark. We also explore several issues with the unlabeled set from STL-10 that should be considered in future research using this dataset

    Visual representation learning with deep neural networks under label and budget constraints

    Get PDF
    This thesis presents the work done in the area of semi-supervised learning, label noise, and budgeted training for deep learning approaches to computer vision. The improvements seen in computer vision since the successful introduction of deep learning rely on the availability of large amounts of labeled data and long lasting training processes. First, this research studies the three main alternatives to fully supervised deep learning categorized in three different levels of supervision: unsupervised learning (no label involved), semi-supervised learning (a small set of labeled data is available), and label noise (all the samples are labeled but some of them are incorrect). These alternatives aim at reducing the cost of building fully annotated and finely curated datasets, which in most cases is time consuming and requires expert annotators. State-of-the-art performance has been achieved in several semi-supervised, unsupervised, and label noise benchmarks including CIFAR10, CIFAR100, and STL-10. Additionally, the solutions proposed for learning in the presence of label noise have been validated in realistic benchmarks built with datasets annotated from web information: WebVision and Clothing1M. Second, this research explores alternatives to reduce the computational cost of the training of deep learning systems that currently require hours or days to reach state-of-the-art performance. Particularly, this research studied budgeted training, i.e.~when the training process is limited to a fixed number of iterations. Experiments in this setup showed that for better model convergence, variety in the data is preferable than the importance of the samples used during training. As a result of this research, three main author publications have been generated, one more has been recently submitted to review for a conference, and several other secondary author publications have been produced in close collaboration with other researchers in the centre

    Improving Unsupervised Learning With Exemplar CNNS

    Get PDF
    Most recent unsupervised learning methods explore alternative objectives, often referred to as self-supervised tasks, to train convolutional neural networks without the supervision of human annotated labels. This paper explores the generation of surrogate classes as a self-supervised alternative to learn discriminative features, and proposes a clustering algorithm to overcome one of the main limitations of this kind of approach. Our clustering technique improves the initial implementation and achieves 76.4% accuracy in the STL-10 test set, surpassing the current state-ofthe- art for the STL-10 unsupervised benchmark. We also explore several issues with the unlabeled set from STL-10 that should be considered in future research using this dataset

    Embedding contrastive unsupervised features to cluster in- and out-of-distribution noise in corrupted image datasets

    Full text link
    Using search engines for web image retrieval is a tempting alternative to manual curation when creating an image dataset, but their main drawback remains the proportion of incorrect (noisy) samples retrieved. These noisy samples have been evidenced by previous works to be a mixture of in-distribution (ID) samples, assigned to the incorrect category but presenting similar visual semantics to other classes in the dataset, and out-of-distribution (OOD) images, which share no semantic correlation with any category from the dataset. The latter are, in practice, the dominant type of noisy images retrieved. To tackle this noise duality, we propose a two stage algorithm starting with a detection step where we use unsupervised contrastive feature learning to represent images in a feature space. We find that the alignment and uniformity principles of contrastive learning allow OOD samples to be linearly separated from ID samples on the unit hypersphere. We then spectrally embed the unsupervised representations using a fixed neighborhood size and apply an outlier sensitive clustering at the class level to detect the clean and OOD clusters as well as ID noisy outliers. We finally train a noise robust neural network that corrects ID noise to the correct category and utilizes OOD samples in a guided contrastive objective, clustering them to improve low-level features. Our algorithm improves the state-of-the-art results on synthetic noise image datasets as well as real-world web-crawled data. Our work is fully reproducible github.com/PaulAlbert31/SNCF.Comment: Accepted at ECCV 202

    Joint one-sided synthetic unpaired image translation and segmentation for colorectal cancer prevention

    Full text link
    Deep learning has shown excellent performance in analysing medical images. However, datasets are difficult to obtain due privacy issues, standardization problems, and lack of annotations. We address these problems by producing realistic synthetic images using a combination of 3D technologies and generative adversarial networks. We propose CUT-seg, a joint training where a segmentation model and a generative model are jointly trained to produce realistic images while learning to segment polyps. We take advantage of recent one-sided translation models because they use significantly less memory, allowing us to add a segmentation model in the training loop. CUT-seg performs better, is computationally less expensive, and requires less real images than other memory-intensive image translation approaches that require two stage training. Promising results are achieved on five real polyp segmentation datasets using only one real image and zero real annotations. As a part of this study we release Synth-Colon, an entirely synthetic dataset that includes 20000 realistic colon images and additional details about depth and 3D geometry: https://enric1994.github.io/synth-colonComment: arXiv admin note: substantial text overlap with arXiv:2202.0868

    Self-Supervised and Semi-Supervised Polyp Segmentation using Synthetic Data

    Full text link
    Early detection of colorectal polyps is of utmost importance for their treatment and for colorectal cancer prevention. Computer vision techniques have the potential to aid professionals in the diagnosis stage, where colonoscopies are manually carried out to examine the entirety of the patient's colon. The main challenge in medical imaging is the lack of data, and a further challenge specific to polyp segmentation approaches is the difficulty of manually labeling the available data: the annotation process for segmentation tasks is very time-consuming. While most recent approaches address the data availability challenge with sophisticated techniques to better exploit the available labeled data, few of them explore the self-supervised or semi-supervised paradigm, where the amount of labeling required is greatly reduced. To address both challenges, we leverage synthetic data and propose an end-to-end model for polyp segmentation that integrates real and synthetic data to artificially increase the size of the datasets and aid the training when unlabeled samples are available. Concretely, our model, Pl-CUT-Seg, transforms synthetic images with an image-to-image translation module and combines the resulting images with real images to train a segmentation model, where we use model predictions as pseudo-labels to better leverage unlabeled samples. Additionally, we propose PL-CUT-Seg+, an improved version of the model that incorporates targeted regularization to address the domain gap between real and synthetic images. The models are evaluated on standard benchmarks for polyp segmentation and reach state-of-the-art results in the self- and semi-supervised setups

    Pseudo-Labeling and Confirmation Bias in Deep Semi-Supervised Learning

    Get PDF
    Semi-supervised learning, i.e. jointly learning from labeled and unlabeled samples, is an active research topic due to its key role on relaxing human supervision. In the context of image classification, recent advances to learn from unlabeled samples are mainly focused on consistency regularization methods that encourage invariant predictions for different perturbations of unlabeled samples. We, conversely, propose to learn from unlabeled data by generating soft pseudo-labels using the network predictions. We show that a naive pseudo-labeling overfits to incorrect pseudo-labels due to the so-called confirmation bias and demonstrate that mixup augmentation and setting a minimum number of labeled samples per mini-batch are effective regularization techniques for reducing it. The proposed approach achieves state-of-the-art results in CIFAR-10/100, SVHN, and Mini-ImageNet despite being much simpler than other methods. These results demonstrate that pseudo-labeling alone can outperform consistency regularization methods, while the opposite was supposed in previous work. Source code is available at https://git.io/fjQsC

    Reliable Label Bootstrapping for Semi-Supervised Learning

    Get PDF
    Reducing the amount of labels required to train convolutional neural networks without performance degradation is key to effectively reduce human annotation efforts. We propose Reliable Label Bootstrapping (ReLaB), an unsupervised preprossessing algorithm which improves the performance of semi-supervised algorithms in extremely low supervision settings. Given a dataset with few labeled samples, we first learn meaningful self-supervised, latent features for the data. Second, a label propagation algorithm propagates the known labels on the unsupervised features, effectively labeling the full dataset in an automatic fashion. Third, we select a subset of correctly labeled (reliable) samples using a label noise detection algorithm. Finally, we train a semi-supervised algorithm on the extended subset. We show that the selection of the network architecture and the self-supervised algorithm are important factors to achieve successful label propagation and demonstrate that ReLaB substantially improves semi-supervised learning in scenarios of very limited supervision on CIFAR-10, CIFAR-100 and mini-ImageNet. We reach average error rates of 22.34\boldsymbol{22.34} with 1 random labeled sample per class on CIFAR-10 and lower this error to 8.46\boldsymbol{8.46} when the labeled sample in each class is highly representative. Our work is fully reproducible: https://github.com/PaulAlbert31/ReLaB.Comment: 10 pages, 3 figure
    corecore